Integrating neural networks with advanced optimization techniques for accurate kidney disease diagnosis

Kidney diseases pose a significant global health challenge, requiring precise diagnostic tools to improve patient outcomes. This study addresses this need by investigating three main categories of renal diseases: kidney stones, cysts, and tumors. Utilizing a comprehensive dataset of 12,446 CT whole abdomen and urogram images, this study developed an advanced AI-driven diagnostic system specifically tailored for kidney disease classification. The innovative approach of this study combines the strengths of traditional convolutional neural network architecture (AlexNet) with modern advancements in ConvNeXt architectures. By integrating AlexNet’s robust feature extraction capabilities with ConvNeXt’s advanced attention mechanisms, the paper achieved an exceptional classification accuracy of 99.85%. A key advancement in this study’s methodology lies in the strategic amalgamation of features from both networks. This paper concatenated hierarchical spatial information and incorporated self-attention mechanisms to enhance classification performance. Furthermore, the study introduced a custom optimization technique inspired by the Adam optimizer, which dynamically adjusts the step size based on gradient norms. This tailored optimizer facilitated faster convergence and more effective weight updates, imporving model performance. The model of this study demonstrated outstanding performance across various metrics, with an average precision of 99.89%, recall of 99.95%, and specificity of 99.83%. These results highlight the efficacy of the hybrid architecture and optimization strategy in accurately diagnosing kidney diseases. Additionally, the methodology of this paper emphasizes interpretability and explainability, which are crucial for the clinical deployment of deep learning models.

The main contributions of this study are as follows • Novel classification method: the paper proposes a new approach for classifying kidney diseases that dem- onstrates robust performance across various datasets, emphasizing the importance of interpretability and explainability for clinical applications.• Advanced integration of neural networks: this study integrates features from AlexNet and ConvNexT to create a comprehensive and informative feature representation.This fusion leverages the strengths of both architectures, resulting in superior performance compared to individual models.• Enhanced model performance: By combining AlexNet and ViT, the paper achieved improved discriminative ability, capturing a broader range of visual features and surpassing the performance of the individual models.• Optimized training process: this study introduced a custom optimization technique based on Adam that dynamically adjusts the step size according to the gradient norm, leading to more efficient convergence in training the merged AlexNet and ConvNexT models.
The rest of the paper is organized as follows; in the next section; literature reviews.In Sect."Motivation", the motivation.In Sect."Proposed methodology", the proposed methodology is used in this paper, followed by Sect."Experiments and results".Finally, in Sect."Conclusions"; the paper is concluded with future work.

Literature reviews
The classification of kidney diseases is a pivotal area of research that holds significant implications for clinical diagnosis, treatment planning, and patient management.As the understanding of renal disorders continues to evolve, there has been a growing body of literature dedicated to exploring various methodologies and techniques for accurate and efficient kidney disease classification.This literature review seeks to provide a comprehensive overview of the existing research landscape, delving into the diverse approaches employed in the classification of kidney conditions.From traditional methods to the latest advancements in machine learning and deep learning, this review aims to distill key insights and trends, shedding light on the progress made in enhancing diagnostic accuracy and paving the way for more effective therapeutic interventions.Through a systematic exploration of relevant studies, this literature review endeavors to offer a synthesis of knowledge that not only underscores the current state of kidney disease classification but also identifies potential avenues for future research and technological innovation in this critical domain.Parakh et al. 7 proposed the initial convolutional neural network (CNN) was responsible for delineating the urinary tract's extent, while the second CNN focused on identifying the presence of stones.The authors created nine model variations by combining different training data sources (S1, S2, or both, denoted as SB) with pre-trained CNNs using ImageNet and GrayNet, as well as without pretraining (Random).The accuracy of GrayNet-SB, at 95%, surpassed that of ImageNet-SB (91%) and Random-SB (88%).
The research of Kuo et al. 8 aims to enhance the prediction of kidney function and chronic kidney disease (CKD) through kidney ultrasound imaging, develop a model integrating the ResNet architecture, pre-trained on the ImageNet dataset, to estimate the glomerular filtration rate (eGFR) and CKD status from 4505 labeled kidney ultrasound images.The model demonstrated a strong correlation (Pearson coefficient of 0.741) between AI-based and creatinine-based GFR estimations and achieved 85.6% accuracy in classifying CKD status, outperforming experienced nephrologists (60.3%-80.1%).
Sudharson et al. 9 utilized an ensemble technique, amalgamating diverse pre-trained Deep Neural Networks (DNNs) such as ResNet-101, ShuffleNet, and MobileNet-v2.The ultimate predictions were determined through the majority voting technique, resulting in a peak classification accuracy of 96.54% during testing with highquality images and 95.58% during testing with noisy images.
Aksakallı et al. 10 proposed the examination encompassed diverse machine learning approaches, including Decision Trees (DT), Random Forest (RF), Support Vector Machines (SVC), Multilayer Perceptron (MLP), www.nature.com/scientificreports/K-Nearest Neighbor (kNN), Naive Bayes (BernoulliNB), and deep neural networks employing Convolutional Neural Network (CNN).The experimental outcomes revealed that the Decision Tree Classifier (DT) yielded the most favorable classification results.Specifically, this method attained the highest F1 score, achieving a success rate of 85.3% when employing the S + U sampling method.
Liu et al. 11 focuse on making deep learning techniques more accessible for clinical users in the field of microscopic image classification by developing AIMIC, out-of-the-box software that requires no programming knowledge.AIMIC integrates advanced deep learning methods and data preprocessing techniques, allowing users to train new networks and infer unseen samples seamlessly.The platform was evaluated on four benchmark microscopy image datasets, demonstrating its effectiveness in selecting suitable algorithms for entry-level practitioners.Notably, the ResNeXt-50-32 × 4d model achieved the highest performance with an average accuracy of 96.83% and an average F1-score of 96.82%, making it the preferred choice for microscopic image classification.Additionally, MobileNet-V2 provided a good balance between accuracy (95.72%) and computational cost, with an inference time of 0.109 s per sample, making it a viable option for scenarios with limited computing resources.
Srivastava et al. 12 used machine learning models (SVM, KNN, Random Forest, Decision Tree, AdaBoost) with the normalized dataset with an accuracy of 98.75%.Baygin et al. 13 proposed a novel transfer learning-based image classification method called ExDark19.This method utilized iterative neighborhood component analysis (INCA) to select the most informative feature vectors, which were then input into a k nearest neighbor (kNN) classifier for kidney stone detection.Their results achieved an accuracy of 99.22% with a ten-fold cross-validation strategy and 99.71% using the hold-out validation method.
Nazmul Islam et al. 14 employed a total of six machine learning models, with three being founded on advanced variants of Vision Transformers, namely EANet, CCT, and Swin Transformers.The remaining three models were based on deep learning architectures, ResNet, VGG16, and Inception v3, with adjustments made to their final layers.Despite commendable performances from the VGG16 and CCT models, the Swin Transformer emerged as the top performer in terms of accuracy, achieving an impressive accuracy rate of 99.30 percent.In this investigation, diverse physiological parameters were considered alongside the application of various machine learning (ML) techniques.Different ML models, including Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Random Forest, Decision Tree, and AdaBoost, were trained using a normalized dataset, resulting in an impressive accuracy of 98.75%, perfect sensitivity (100%), high specificity of 96.55%, and a notable f1 score of 99.03%.
Subedi et al. 15 explore the potential of a novel model called Vision Transformer (ViT), which was initially designed for natural language processing (NLP) tasks but shows promise for medical image classification.ViT's capabilities are further enhanced by coupling it with Fully Connected Networks (FCN).This combination merges the feature extraction capabilities of ViT with the classification ability of FCN, ultimately overcoming the challenge of detecting kidney-related issues with greater accuracy and reliability with an accuracy of 99.64%.
Asif et al. 16 introduced "StoneNet" which is based on MobileNet using depthwise separable convolution, offering a low-cost solution compared to existing models with drawbacks such as high computational costs and lengthy training times.Their model achieved accuracy at 97.98%, with short training and testing times of 996.88 s and 14.62 s, respectively.Qadir et al. 17  Sasikaladevi et al. 18 address the critical need for early and automatic detection of chronic kidney disease (CKD) from radiology images using deep learning techniques.The dataset used contains 12,446 unique CT scan images.Deep features were extracted from these images, and hyperedges were generated to construct hypergraphs representing the renal images.These hypergraphs were then used in a hypergraph convolutional neural network for representational learning.The model was validated using a hold-out dataset, and deep learning metrics including precision, recall, accuracy, and F1 score were used to evaluate its performance.The proposed model demonstrated a superior validation accuracy of 99.71%, outperforming other state-of-the-art algorithms.This robust digital-twin model facilitates early diagnosis of kidney diseases and aids nephrologists in better prognosis of kidney-related abnormalities.

Motivation
The urgent need to improve patient care and medical diagnostics in the field of renal health is the driving force behind the kidney classification paper.Kidney illnesses are a major global health concern, encompassing both acute and chronic ailments.Timely and accurate categorization of these ailments is essential for efficient treatment strategy development and patient supervision.
Several factors contribute to the motivation for kidney classification research: • Clinical Importance: Diagnosing kidney disorders accurately can be challenging due to their wide range of etiologies and symptoms.Enhancing classification techniques helps medical professionals better comprehend various kidney disorders and customize treatment plans based on individual disease profiles.

Proposed methodology
The paper discusses the impact of the concatenating features for enhancing the accuracy of kidney disease classification using the merging of Alex-Net 19 with other models such as (ViT 20 , Swin 21 , and ConvNexT 22 ) and also the impact with using the modified Adam optimizer "Custom-Adam" instead of the popular optimizer "Adam".The paper compared its performance with more recent architectures such as VGG and ResNet.The results show that the pre-trained VGG and ResNet models achieved accuracies of 91.73% and 94.63%, respectively.In contrast, more advanced models such as Vision Transformer (ViT), Swin Transformer, and ConvNexT achieved higher accuracies of 98.71%, 96.44%, and 96.44%, respectively.These findings highlight the superior performance of these newer architectures over Alex-Net.While Alex-Net has a well-established reputation in image classification tasks as its architecture is known for efficient feature extraction, which is crucial for accurately classifying kidney diseases from medical images.
Transformer models which include ViT and Swin have demonstrated remarkable performance in various computer vision tasks, particularly in capturing long-range dependencies and spatial relationships within images.For example, the main purpose for using the ViT model is self-attention mechanism allows it to capture global contextual information in images, enabling it to identify complex patterns and long-range relationships.But Swin optimizes the attention computation in Vision Transformers by limiting self-attention to non-overlapping local windows.This shifted window approach reduces the normally quadratic complexity of ViT to linear complexity concerning image size, making Swin more computationally efficient.Also, Swin is a hierarchical vision transformer that progressively merges adjacent patches as the network deepens.This hierarchical structure enables the model to manage features at various scales, enhancing the learning of robust and discriminative features compared to convolutional neural networks.But with ConvNexT model, incorporates modern techniques like hierarchical design and larger kernel sizes, enhancing its ability to handle diverse image features while maintaining the simplicity of traditional CNNs.The paper included these models to explore their potential to extract relevant features from medical images, which could contribute to improving diagnostic accuracy.
On the other side, changing the optimizer can significantly impact model accuracy, convergence speed, generalization ability, and overall stability.Therefore, choosing the right optimizer is crucial for optimizing machine learning models.The paper compared the effect of Adam 23 and Custom_Adam optimizer on the dataset to find the Custom_Adam is better in most cases while the primary difference between the standard Adam optimizer and the Custom_Adam lies in the additional calculation and utilization of the gradient norm in the custom version.Specifically, Custom_Adam computes the norm of the gradient (denoted as norm_value) for each parameter θ with a non-None gradient: This norm is then used in the custom update rule.The _update_rule method in Custom_Adams incorporates this norm_value along with the parameter θ , gradient g t , and state during the update process, which can be expressed as: (1) www.nature.com/scientificreports/ The parameter update in the standard Adam is as: Additionally, Custom_Adam overrides the step method to include the gradient norm calculation and the call to the _update_rule, whereas the standard Adam optimizer utilizes its default step method without these extra computations.This enhancement allows Custom_Adam to adapt the learning rate based on the gradient's scale, potentially improving optimization performance.See the algorithm as the following .

End
Algorithm: Custom_Adam To accomplish this, the two actions listed can be taken: first; compare using four single vision models (ViT, Alex-Net, Swin, and ConvNexT) for extracting the features from images by using the optimizer Adam and the Custom_Adam.The second is to improve the extracting feature process using the concatenating features from the four vision models with the best optimizer that got from the first action; the vision models are ("Swin + Con-vNexT", "Alex-Net + ViT", "Alex-Net + Swin" and "Alex-Net + ConvNexT" ).The paper finds as in Fig. 3 that concatenating the models Alex-Net with ConvNexT with Custom_Adam optimizer is the best value in accuracy 99.85% with metrics used for the evaluation such as average precision, recall, and specificity, reaching 99.89%, 99.95%, and 99.83% respectively.
The methodology of this study for kidney classification involves several steps as in Fig. 1 1.Image loading from the directory then applied using T.Compose to augment the training data, these transformations include random horizontal and vertical flips, random color jitter, resizing to 256 * 256 pixels, center cropping to 224 * 224 pixels, conversion to a PyTorch tensor, normalization using ImageNet mean and standard deviation, and random erasing with a probability of 0.1.2. Load pre-trained models (AlexNet and ConvNexT) then freeze the parameters of the loaded models and create a new model by concatenating the output features of the two models and then adding a classifier layer.3. Define a custom optimizer class that inherits from Adam with the modifications.4. Define functions to get data loaders for training and validation then implement data loading and augmentation for the training set and the validation set. 5. Define the training loop using the optimizer and the loss "CrossEntropyLoss".6. Evaluate the model using the confusion matrix and the learning curve for the loss and the accuracy.

Dataset
The paper used the dataset that originated from various hospitals in Dhaka, Bangladesh, where patients had previously received diagnoses related to kidney tumors, cysts, normal conditions, or stone findings.The gathered data from the Picture Archiving and Communication System (PACS), incorporating both Coronal and Axial cuts from contrast and non-contrast studies covering the entire abdomen and urogram.Subsequently, patient information and metadata were excluded from the Dicom images, and the images were converted to a lossless jpg format.To ensure accuracy, each image finding underwent verification by both a radiologist and a medical technologist after the conversion process 14

Experiments and results
This study used the assembled and annotated 12,446 CT 14

Performance evaluation methods
The evaluation of the eight models involves an analysis based on parameters such as accuracy in training, sensitivity (or recall), and precision (or positive predictive value -PPV).To calculate precision, and Recall, the paper utilizes true positive (TP), false positive (FP), true negative (TN), and false negative (FN) samples.Recall, also known as sensitivity, is determined by dividing the number of true positives by the sum of true positives and false negatives.In medical diagnosis, high recall is imperative for accurately identifying individuals with the disease, as overlooking the positive category can result in serious consequences like misdiagnosis and treatment delays.Precision (PPV) becomes crucial when assessing the proportion of predicted positive examples that are genuinely positive.Precision is calculated by dividing the number of true positives by the sum of true positives and false positives.In the realm of medical imaging, achieving high precision is highly desirable.The F1 score for all models is derived from the sensitivity and precision values.The provided formulas are applied to calculate accuracy, precision, sensitivity, and the F1 score 24 .
where, i=class of the kidney (Cyst or Normal or Stone or Tumor), TP= True Positive, FN= False Negative, TN=True Negative.Table 2 shows the comparison between single vision models using the Adam optimizer and custom_Adam optimizer for the four classes of kidney diseases with some factors such as; accuracy, precision, recall, f-score, and the average for the four classes.
The presented table summarizes the performance of various models, each employing different optimizers, in distinguishing between four classes: Cyst, Normal, Stone, and Tumor.Notably, Vision Transformer (ViT) models, both with Adam and Custom_Adam optimizers, consistently demonstrate robust accuracy, precision, and recall across the specified classes, showcasing their effectiveness in image classification tasks.Swin and ConvNexT models also exhibit commendable performance, with high accuracy and stable precision-recall metrics.Alex-Net models, while slightly lagging in accuracy, still demonstrate competitive results.The ViT model with Adam optimizer consistently demonstrates high accuracy across all classes, making it a strong contender.Precision and recall are often critical in medical imaging; the balance between the two might be preferred.
Here, the study presents the best confusion matrix for the four individual vision models utilizing Adam and custom_Adam, which demonstrates improved results in Figs. 6, 7, 8, and 9.
Visualizing results using class-wise error rates is also essential for the evaluation of image classification models.This approach provides a detailed view of the model's performance across different categories.Unlike overall accuracy metrics, which aggregate performance across all classes, class-wise error rates highlight disparities in classification performance.It can offer a comprehensive understanding of model efficiency.Here is the class-wise error rate for the best four models the paper used in Figs. 10, 11, 12, and 13.The summarized comparison of the class-wise error rate between the best four models in Fig. 14 As in Fig. 14, all models consistently achieve near-perfect performance, with the second model (Swin with custom_Adam optimizer) achieving perfect classification.The error rates vary, with the third model (Alex-Net with custom_Adam optimizer) showing higher error rates, while the final model (ViT with Adam optimizer) shows the best performance.All models demonstrate strong performance with low error rates, with the second and fourth models showing the best performance.The best overall model appears to be the "ViT with Adam optimizer model", as it achieves the lowest error rates across most classes, demonstrating consistent and strong performance in classifying 'Cyst' , 'Normal' , 'Stone' , and 'Tumor' samples.
Table 3 shows the comparison between concatenated vision models using Adam and custom_Adam optimizer for the four classes of kidney diseases with some factors such as; accuracy, precision, recall, f-score, and the average for the four classes.
Table 3 presents the effect of the concatenated features between the models.Alex-Net + ConvNexT with the custom_Adam stand out with the highest accuracy of 99.85%.On the other hand, the model with the lowest accuracy among those provided, Swin + ConvNexT with the custom_Adam optimizer with an accuracy of 98.75% has the lowest accuracy but its balanced precision and recall suggest effectiveness across various classes.But Alex-Net + ConvNext with the custom_Adam stands out with consistently high average precision (0.9989) and recall (0.9995) values, indicating robust performance across all classes.Among the provided models, the custom_Adam optimizer consistently outperforms the standard Adam optimizer in terms of accuracy, precision, recall, and F1-score in all concatenated models specifically the Alex-Net model with any Transformer model with the dynamic adjustment of the step size based on the norm of the gradient except of the Swin + ConvNexT model which give the less result with the custom_Adam and the Adam optimizer which may because the different architectures that make the model more complexity.Also if the gradient flow between Swin and ConvNexT is not well-aligned, the gradients might not propagate effectively during training, leading to convergence challenges.
(4) www.nature.com/scientificreports/ Here, the study presents the best confusion matrix for the four concatenated vision models utilizing Adam and custom_Adam, which demonstrates the best results in Figs. 15, 16, 17, and 18.Here is the class-wise error rate of the best concatenated models in Figs.

No. parameters of different models
One essential feature that greatly affects a neural network model's capacity, efficiency, and flexibility is the number of parameters.Deep learning models consist of several layers, each of which has weights and biases that add to the total number of parameters.Greater representational capacity is often possessed by larger, more parameterized models, which allows them to learn complex characteristics and relationships in data.Conversely, more compact models with fewer parameters could be less prone to overfitting and more computationally efficient, which makes them appropriate for jobs requiring sparse data.As shown in Table 4, the total number of parameters and trainable parameters for the single models and the concatenated models used in this paper.It's generally more meaningful to focus on "Trainable parameters" rather than "Total number of parameters." because not all parameters in a model may be trainable, as some might be fixed or non-trainable.As in Table 4, the model with the least parameters is Swin, and the model with the most parameters is Alex-Net + ConvNexT.Larger parameter counts are often associated with better model accuracy, so the progression from the model with the least parameters to the most parameters could represent an increase in model capacity and, potentially, accuracy as in Fig. 24.

Time evaluation
For each of the 8 tested single models, the study compared the time taken for training for each model to get the less time that was taken.As shown in Fig. 25, Alex-Net with Adam optimizer was the fastest in training as it took the least training time (50 minutes) with an accuracy of 96.32 followed by Swin with Adam optimizer which took

Conclusions
This study explored the impact of feature concatenation and optimizer selection on neural network performance.The experimental results reveal that concatenating features, such as Alex-Net + ConvNexT, in combination with the custom_Adam optimizer, achieved an impressive accuracy of 99.85%.This highlights the benefits of        integrating diverse model architectures and optimizing strategies to capture complex patterns and correlations in data.The custom_Adam optimizer demonstrated superior performance compared to the standard Adam optimizer across all concatenated models, excelling in accuracy, precision, recall, and F1-score.Particularly notable was its effect when paired with Transformer models, where dynamic step size adjustments based on gradient norms contributed to consistently high average recall and accuracy.The trade-off between model capacity and efficiency was evident, with the Swin model, despite its fewer parameters, performing competitively.This underscores its utility in scenarios where computational efficiency and reduced overfitting are critical.While larger models like Alex-Net + ConvNexT exhibited higher accuracy, the Swin + Alex-Net combination offered a balanced approach with a training duration of 2 h and 30 min and an accuracy of 99.78%.Conversely, the Alex-Net + ViT configuration, though achieving 99.74% accuracy, required the longest training time of approximately 7 h.
. The dataset contains 12,446 unique data within it which the cyst contains 3709, normal 5077, stone 1377, and tumor 2283.As shown in Fig 2. The sample of the dataset used.

Fig. 1 .
Fig. 1.The methodology of this study for Kidney classification.
19, 20, 21, and 22.The summarized comparison for the class-wise error rate between the best four concatenated models in Fig.23

Fig. 10 .
Fig. 10.Class-wise error rate for ViT with Adam optimizer model.

Fig. 14 .
Fig. 14.Class-wise error rate for the best four models.

Fig. 23 .
Fig. 23.Class-wise error rate for the best four concatenated models.

Fig. 24 .
Fig. 24.Trainable parameters for the models used in the paper.

Fig. 25 .
Fig. 25.Time evaluation for training the eight single models.

Table 1 .
focused on the Densenet-201 model for feature extraction with Random Forest being the chosen method.They achieved an accuracy rate of 99.719%.Table 1 presents the related work for the kidney classification.Related work in kidney classification using different datasets.Significant values are in [bold].
Accuracy = 99.85,precision, recall, and specificity reaching 99.89%, 99.95%, and 99.83% respectively • Early Identification and Intervention: It's critical to identify kidney disorders early to launch prompt interven- tions that can halt the disease's progression and enhance patient outcomes.Classification models can help detect kidney function issues early on, which can result in more proactive and focused medical interventions.• Application of Advanced Technologies: The development of complex models for the classification of renal disease is made possible by advances in machine learning, deep learning, and image processing techniques.Making use of these technologies has the potential to completely transform how accurate and effective diagnostic procedures are.
Percision i * Recall i Percision i + Recall i

Table 2 .
Comparison between the individual models using the Adam and custom_Adam optimizer Significant values are in [bold].

Table 4 .
No. of parameters of the single and concatenated models.Significant values are in [bold].